per chi vuole provare a simulare le cose in tempo reale
qr code che manda a questo link https://github.com/sitalaura/link-functions/tree/main/R
oppure scaricare il file a questo percorso sitalaura.github.io/link-functions/R/datasim.R
independent variable: age in years (years)
dependent variable: (variabile)
using the classical linear predictor
what we dont see it bc its a default parameter but its actually hidden in our code:
the model uses family gaussian and the identity link function
link function in GLMs transforms (re-map) the linear predictor X
to the appropriate range of the response variable Y
independent variable: age in years (years)
dependent variable: mistakes in a TRUE/FALSE task (accuracy)
using the classical linear predictor
questo modello ci aiuta a predire i dati?
no perché i nuovi dati simulati dal modello vanno chiaramente fuori dal range (0,1) di possibili valori per l’accuratezza
IN THE FIRST EXAMPLE an identity link was appropriate bc
boh) spans from -inf to +infhere an identity link is NOT appropriate bc
accuracy) spans from 0 to 1in this case, link="logit" makes sure that y spans from 0 and 1
independent variable: age in years (years)
dependent variable: mistakes in a TRUE/FALSE task (accuracy)
adding a new main effect
groups: normal kids (group = 0)
kids with dyslexia (group = 1)
a positive interaction emerges
Call:
glm(formula = accuracy ~ age * group, data = d)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.948053 0.002561 370.13 <2e-16 ***
age 0.049945 0.002169 23.02 <2e-16 ***
group1 -0.089275 0.003670 -24.32 <2e-16 ***
age:group1 0.064539 0.003108 20.77 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.003361547)
Null deviance: 16.3436 on 999 degrees of freedom
Residual deviance: 3.3481 on 996 degrees of freedom
AIC: -2851.5
Number of Fisher Scoring iterations: 2
a negative interaction emerges
fit = glm(accuracy ~ age*group, data=d, family=binomial(link="logit"), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "logit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.21231 0.06939 60.706 < 2e-16 ***
age 1.63490 0.04790 34.135 < 2e-16 ***
group1 -1.72553 0.07601 -22.700 < 2e-16 ***
age:group1 -0.38298 0.05365 -7.139 9.41e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8876 on 999 degrees of freedom
Residual deviance: 1104 on 996 degrees of freedom
AIC: 3186
Number of Fisher Scoring iterations: 5
non ho simulato un’interazione, quindi ENTRAMBI i modelli trovano un’interazione che non c’è.
let’s try out the multiple alternative forced choice (50% - bc of the true/false) probit link
no interaction emerges !!!! as it should
fit = glm(accuracy ~ age*group, data=d, family=binomial(link=mafc.probit(.m=2)), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = mafc.probit(.m = 2)),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.91347 0.03466 55.201 <2e-16 ***
age 0.93311 0.02790 33.441 <2e-16 ***
group1 -0.96005 0.03983 -24.104 <2e-16 ***
age:group1 0.05339 0.03510 1.521 0.128
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8876.00 on 999 degrees of freedom
Residual deviance: 958.75 on 996 degrees of freedom
AIC: 3040.8
Number of Fisher Scoring iterations: 6
equal intervals on X correspond to equal intervals on Y
su x ed y metti i nomi delle variabili dell’esempio
equal intervals on X correspond to equal ratios (NOT equal intervals) on Y
Building a model means that we want to find the processo generativo dei dati which, diversamente dal mondo delle simulazioni, we could never know for sure
to do that we must make important decisions
choosing the more appropriate family of distributions to make sure that the new values of the vd im predicting lie within the bounds
choosing the more appropriate link function: otherwise it’s very likely you end up finding non linear effects (ie interactions) that are not there!
We’re conducting a systematic review concerning how often the wrong link functions are used in psychological research + they lead to finding a significant interaction: so far, quite often
All materials are available on GitHub at sitalaura/link-functions
Questions and feedbacks laura.sita@studenti.unipd.it
Domingue, B. W., Kanopka, K., Trejo, S., Rhemtulla, M., & Tucker-Drob, E. M. (2024). Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychological methods, 29(6), 1164.
Hardwicke, T. E., Thibault, R. T., Clarke, B., Moodie, N., Crüwell, S., Schiavone, S. R., Handcock, S. A., Nghiem, K. A., Mody, F., Eerola, T., et al. (2024). Prevalence of transparent research practices in psychology: A cross-sectional study of empirical articles published in 2022. Advances in Methods and Practices in Psychological Science, 7 (4), 25152459241283477.
Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1), 156.
Special thanks to
a negative interaction emerges
fit = glm(accuracy ~ age*group, data=d, family=binomial(link="probit"), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "probit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.21133 0.03018 73.280 < 2e-16 ***
age 0.81113 0.02295 35.337 < 2e-16 ***
group1 -0.79152 0.03400 -23.279 < 2e-16 ***
age:group1 -0.11299 0.02637 -4.285 1.83e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8812.26 on 999 degrees of freedom
Residual deviance: 853.32 on 996 degrees of freedom
AIC: 2928.7
Number of Fisher Scoring iterations: 6
Cognitive Science Arena 2026